Skip to main content

Find predicted value for response variable

The script below shows how to use the margins() option to automatically retrieve the predicted value for the response variable in a linear or binary regression model. It is calculated according to the traditional method, by entering average values ​​for the respective explanatory variables in the model. These are multiplied by the respective coefficient estimates and finally summed together with the constant term. For binary models generated via logit and probit, a transformation is carried out so that the predicted value can be interpreted in the same way as for regress. The script also shows how to calculate the predicted value for the response variable manually as described.

 require no.ssb.fdb:34 as db

create-dataset regressiondata
import db/INNTEKT_WLONN 2022-12-31 as salary
import db/INNTEKT_BER_BRFORM 2022-12-31 as wealth
import db/BEFOLKNING_FOEDSELS_AAR_MND as birthyearmonth
import db/BEFOLKNING_KJOENN as gender
import db/BEFOLKNING_STATUSKODE 2022-01-01 as residency_status

keep if residency_status == '1'

generate age = 2022 - int(birthyearmonth/100)
generate man = gender == '1'
generate highsalary = salary > 800000
generate highwealth = wealth > 4000000


//------------------- regress ---------------------

//Retrieving the predicted value for the entire analysis population
regress salary age man wealth, margins()

//Retrieving the predicted value for the groups men vs. women by using the dummy variable man
regress salary age man wealth, margins(man)

//Retrieving alternative individual prediction values given the individual values on explanatory variables
regress-predict salary age man wealth, predicted(pred1)

//Finding average values - important that the population is exactly the same as for the regression model
summarize salary age man wealth pred1 if !sysmiss(salary) & !sysmiss(age) & !sysmiss(man) & !sysmiss(wealth)

//Manual calculation of margin estimates (predicted Y values) - uses the values generated by summarize above
generate konst = 232295.022774
generate b_age = 5311.799072
generate b_man = 140398.92883
generate b_wealth = 0.001129
generate avg_age = 42.155463260696
generate avg_man = 0.521736587496133
generate avg_wealth = 2862060.6319129

generate ypred_reg = konst + b_age*avg_age + b_man*avg_man + b_wealth*avg_wealth 
generate ypred_reg0 = konst + b_age*avg_age + b_wealth*avg_wealth
generate ypred_reg1 = konst + b_age*avg_age + b_man + b_wealth*avg_wealth

//Manually calculated prediction values total and for the groups men vs women
summarize ypred_reg ypred_reg0 ypred_reg1


//------------------- logit -----------------------
logit highsalary age man highwealth, margins()
logit highsalary age man highwealth, margins(man)
logit-predict highsalary age man highwealth, predicted(pred2)

summarize highsalary age man highwealth pred2 if !sysmiss(highsalary) & !sysmiss(age) & !sysmiss(man) & !sysmiss(highwealth)

replace konst = -3.412074
replace b_age = 0.013134
replace b_man = 1.022665
generate b_highwealth = 1.748605
replace avg_age = 41.6484
replace avg_man = 0.5215
generate avg_highwealth = 0.241

generate ypred_log = konst + b_age*avg_age + b_man*avg_man + b_highwealth*avg_highwealth
generate ypred_log0 = konst + b_age*avg_age + b_highwealth*avg_highwealth
generate ypred_log1 = konst + b_age*avg_age + b_man + b_highwealth*avg_highwealth

replace ypred_log = 1 / (1 + exp(0-ypred_log))
replace ypred_log0 = 1 / (1 + exp(0-ypred_log0))
replace ypred_log1 = 1 / (1 + exp(0-ypred_log1))

summarize ypred_log ypred_log0 ypred_log1


//------------------- probit ----------------------
probit highsalary age man highwealth, margins()
probit highsalary age man highwealth, margins(man)
probit-predict highsalary age man highwealth, predicted(pred3)

summarize highsalary age man highwealth pred3 if !sysmiss(highsalary) & !sysmiss(age) & !sysmiss(man) & !sysmiss(highwealth)

replace konst = -2.008393
replace b_age = 0.008738
replace b_man = 0.563674
replace b_highwealth = 0.993583
replace avg_age = 41.6484
replace avg_man = 0.5215
replace avg_highwealth = 0.241

generate ypred_prob = konst + b_age*avg_age + b_man*avg_man + b_highwealth*avg_highwealth
generate ypred_prob0 = konst + b_age*avg_age + b_highwealth*avg_highwealth
generate ypred_prob1 = konst + b_age*avg_age + b_man + b_highwealth*avg_highwealth

replace ypred_prob = normal(ypred_prob)
replace ypred_prob0 = normal(ypred_prob0)
replace ypred_prob1 = normal(ypred_prob1)

summarize ypred_prob ypred_prob0 ypred_prob1